Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

fix nf4 performance in hybrid CPU #120

Closed
wants to merge 5 commits into from
Closed

fix nf4 performance in hybrid CPU #120

wants to merge 5 commits into from

Conversation

yuchengliu1
Copy link
Contributor

Type of Change

Automatic modify the thread number and disable to run nf4 on E-core when inferencing nf4 model on hybrid CPU.
Quantizing to nf4 model on hybrid CPU will get a warning now.

Description

detail description
Issues: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Copy link
Contributor

@zhewang1-intc zhewang1-intc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls note that the perf of fp4_e2m1 & fp4_bnb are also poor on E-core.

@kevinintel
Copy link
Contributor

kevinintel commented Feb 8, 2024

Please remember to use int4 config as default when hybrid

@kevinintel kevinintel requested a review from luoyu-intel March 14, 2024 02:58
@luoyu-intel
Copy link
Contributor

@yuchengliu1 Can you check whether the new thread pool dispatch all jobs to P cores if E cores have poor performance?

@luoyu-intel
Copy link
Contributor

f4 performance will be remeasured after #172.

@yuchengliu1
Copy link
Contributor Author

@yuchengliu1 Can you check whether the new thread pool dispatch all jobs to P cores if E cores have poor performance?

Some parallel created with schedule2D do not dispatch, such as parallel of PrologueA and MHA reorder

@luoyu-intel
Copy link
Contributor

@yuchengliu1 Can you check whether the new thread pool dispatch all jobs to P cores if E cores have poor performance?

Some parallel created with schedule2D do not dispatch, such as parallel of PrologueA and MHA reorder

@yuchengliu1 Just run benchmark. Focus on GEMM only.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants